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Abstract — This paper proposes a practical successive decoding 
scheme with finite levels for the finite-state Markov channels 
where there is no a priori state information at the transmitter 
or the receiver. The design employs either a random interleaver 
or a deterministic interleaver with an irregular pattern and an 
optional iterative estimation and decoding procedure within each 
level. The interleaver design criteria may be the achievable rate 
or the extrinsic information transfer (EXIT) chart, depending 
on the receiver type. For random interleavers, the optimization 
problem is solved efficiently using a pilot-utility function, while 
for deterministic interleavers, a good construction is given using 
empirical rules. Simulation results demonstrate that the new 
successive decoding scheme combined with irregular low-density 
parity-check codes can approach the identically and uniformly 
distributed (i.u.d.) input capacity on the Markov-fading channel 
using only a few levels. 

Index Terms — Capacity, decision feedback, fading channel, 
finite-state Markov channel, low-density parity-check (LDPC) 
codes, Markov channel, multistage decoding, mutual information, 
successive decoding. 

I. Introduction 

Many realistic communication systems suffer from un- 
known and time- varying channel conditions. The traditional 
strategy is single-code transmission and joint estimation and 
decoding. For example, iterative channel estimation and de- 
coding was used in [1-3] for flat-fading channels and iterative 
equalization and decoding, so called turbo equalization, was 
used in [4] for inter-symbol interference (ISl) channels. With 
recent advances in low-density parity-check (LDPC) codes, 
channel estimation and decoding is combined into the message 
passing over a joint factor graph of the channel and the code, 
see [5] and [6] for block fading channels, [7] and [8] for ISl 
channels and [9] for Markov channels. In addition, codes need 
to be specifically optimized for the structure of the channel 
and the estimator Various density evolution techniques have 
been proposed to find the optimal degree sequence of irregular 
LDPC codes for different channels [5-9]. Although shown to 
perform well for relatively short ISl channels [7], [8], this 
approach still has a performance gap for fading channels and 
Markov channels and its optimality for a general channel with 
memory is yet to be established. 
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An alternative strategy is successive (or multistage) decod- 
ing with multiple codes. This technique uses a rectangular 
interleaver to multiplex K independent codes into a single 
transmission stream at the transmitter and decodes them se- 
quentially at the receiver Successive decoding was originally 
developed to approach capacity for multilevel modulations 
[10], [11] and multiuser channels [12]. When applied to 
channels with memory, it effectively decomposes the physical 
channel into a bank of K subchannels (levels) with weaker 
memory and additional decision feedback, where simplified 
algorithms, such as separate estimation and decoding (SED), 
and suboptimal codes, may perform well. 

There has been extensive research on this topic. Pfister et al. 
first studied the achievable rate of successive decoding with 
SED for the ISl channel in [13] and the actual codes were then 
constructed by Soriaga et al. in [14]. Varnica et al. in [15] and 
Kavicic et al. in [16] adopted the successive decoding schedule 
for designing the component LDPC codes while performing 
the actual iterative decoding on the joint graph. A simplified 
scheme using only one estimator and one code of fixed rate for 
all subchannels was proposed by Narayanan and Nangare for 
ISl channels in [17] and by Li and Collins for correlated fading 
channels and other channels with memory in [18]. A pair 
of tight upper and lower bounds for the binary-input fading- 
channel capacity was derived in [19] and codes were designed 
to perform very close to the upper bound in [18] and [20]. 

Previous research has focused on asymptotic designs of the 
successive decoding. As the number of levels K ^ oo, the 
subchannels become memoryless and identical [17], [18] and 
the simple SED algorithm and the memoryless-channel opti- 
mized component code may have near-optimal performance 
on ISl channels [14], [17] and fading channels [18]. However, 
if a large number of codes (levels) are not allowed for practical 
reasons, the existing multi-rate designs in [13] and [14] and the 
rectangular interleavers in [14], [17], and [18] are no longer 
optimal. 

This paper addresses the analysis and design of a more 
practical successive decoding scheme under the finite-level 
constraint. Since for a small K, the subchannels are no longer 
memoryless, we employ iterative estimation and decoding 
(lED) at each level to exploit the residual memory. More 
importantly, we propose an irregular interleaving pattern so 
that the K codes may have different lengths and irregular 
symbol placements in the multiplexed transmission stream. In 
this configuration, the single-code iterative schemes [7-9] and 
those with pilot symbols [1-3,5,6] become two special cases of 
successive decoding of one and two levels, respectively. This 
framework offers more design freedom to tradeoff between 



2 



performance and complexity. The deterministic irregular inter- 
leaver is difficult to optimize because there are A'^ possible 
patterns for an A^-bit long /v -level interleaver. Therefore, we 
propose a random interleaver specified by the weight distribu- 
tion w = [wi, • • • , wk], where Wk > and J2k=i "^k = 1- 
It acts as a multiplexer that chooses a bit from code k to 
transmit with probability Wk- The resulting A^-bit sequence 
is expected to have approximately iVkN bits from level k at 
random positions. The random interleaver is much easier to 
design and is asymptotically optimal. It also provides a new 
interpretation of the area property of the extrinsic information 
transfer (EXIT) function [21]. 

This paper develops the successive decoding scheme for 
finite-state Markov channels (FSMCs) whose state evolves 
independently of the channel input. FSMCs are good approx- 
imations to many realistic channels and have been extensively 
investigated [22]. For simplicity we consider the identically 
and uniformly distributed (i.u.d.) binary channel input. Conse- 
quently, we use the maximal achievable information rate when 
the channel input is an i.u.d. binary sequence as performance 
measure. This information rate is called the i.u.d. capacity 
(ji.u.d. jjjjg paper following the notations in [7] and [8]. 
It is also known as the symmetric information rate [13]. In 
order to achieve the channel capacity C > C"" '' , the i.u.d. 
binary sequence may be passed through a nonlinear device 
with memory, such as the inner nonlinear trellis encoder [16], 
to mimic the optimal distribution. The designs in this paper are 
readily applicable to the concatenation of this nonlinear device 
and the original physical channel as well as other channels 
with memory after modifying the estimator. 

The proposed successive decoding technique is analyzed by 
comparing the achievable rate (of the SED algorithm) to the 
i.u.d. capacity. After expressing the rate difference in terms of 
the state-transition matrix of the underlying Markov channel, 
we show that the achievable rate goes to the i.u.d. capacity 
exponentially fast as K ^ oo for both the rectangular inter- 
leaver and the equal-weighted random interleaver. However, 
when K is small, these two rates diverge. The difference 
between them comes from the mutual information loss caused 
by the memoryless-channel assumption in the SED and can be 
recovered with a more sophisticated receiver. In the literature, 
two conceptually different Monte-Carlo methods are used to 
estimate the i.u.d. capacity [13], [23] and the achievable rate 
[13], [8]. We propose a unified way to estimate both of them 
from the output a posteriori probability of the BCJR algorithm 
[24] by changing the distribution of the a priori information. 

The mutual information analysis is used for system design 
as well. The design objective is to maximize the supported 
information rate of a finite-level successive decoding scheme 
through interleaver optimization. We will adopt the existing set 
of AWGN-channel optimized LDPC codes [25] and make no 
attempt to optimize them for the specific channel and receiver 
The code rate, however, needs to be properly chosen according 
to some mutual information measure at each level. For the 
SED, we use the achievable rate, while for the lED, we use 
the maximal code rate at which the EXIT chart analysis [26] 
still predicts the convergence of the iterative process. These 
rates also become the objective functions in optimization. 



For random interleavers, both the achievable rate and the 
EXIT function can be efficiently estimated from a so-called 
pilot-utility function, which measures the achievable rate of 
a single-code system as a function of the percentage of 
randomly-placed pilot symbols. We show that the subchannel 
achievable rate is a point on the pilot-utility function and 
the overall achievable rate is the area of a K-step piecewise 
constant curve beneath it. The EXIT function of the subchan- 
nel estimator is simply the transformation of a segment on 
the pilot-utility function. Consequently, the achievable rate of 
the SED algorithm can be maximized semi-analytically by 
a recursive method. The optimal weight distribution of the 
random interleaver under the lED is also found by matching a 
set of subchannel EXIT functions to the set of decoder EXIT 
functions so that the overall code rate is maximized. Essen- 
tially, instead of designing codes for the channel according 
to the traditional wisdom [5-9], we design the subchannel to 
match the code through interleaver optimization. However, for 
deterministic interleavers, computationally intensive Monte- 
Carlo simulation is required to estimate the mutual informa- 
tion. Hence, we will only optimize a class of interleavers that 
are empirically good. 

If a more stringent overall-delay constraint is imposed, the 
effect of the finite codeword length must be considered. As the 
number of levels increases, the achievable rate will increase 
but the code lengths and their performance will decrease. 
There is an optimal number of levels for a given delay 
constraint. We use the random-coding bound [27] that relates 
the word-error probability to the codeword length to study 
this trade-off. Numerical results show that for the example 
channel, it is beneficial to use more than two levels, provided 
that a moderate delay of several thousand symbols is allowed. 

The outline of the paper is as follows. Section HIl introduces 
the basic concepts of successive decoding. In Section Hill we 
present the definitions of i.u.d. capacity and achievable rate of 
the subchannel and show the convergence of the achievable 
rate to the i.u.d. capacity for the rectangular interleaver 
Section HV] deals with the design of interleavers for the SED. 
We first discuss the properties and the asymptotic optimality 
of random interleavers. We then introduce the pilot-utility 
function and its properties and apply them for interleaver 
optimization. A set of good deterministic interleavers are also 
given. Section IV] proposes the EXIT chart analysis of the lED 
algorithm for system design. Section |VI] uses the random- 
coding error exponents to analyze the impact of a finite delay 
constraint. Section IVIII presents some numerical results and 
Section IVIIII concludes the paper. 

II. Successive Decoding 

A. Channel Model 

This paper considers a Markov-modeled flat-fading channel 
with additive white Gaussian noise (AWGN). The received 
signal Yt is given by 

Yt = HtXt + Wt (1) 

where Ht, Xt, and Wt are the complex channel gain unknown 
to both the receiver and the transmitter, the transmitted symbol. 
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and the AWGN, respectively, and are assumed to be mutually 
independent. The input is an independently and uniformly 
distributed (i.u.d.) binary sequence, Xt E { — 1,+1}, with 
power Eg ~ 1. The AWGN has a symmetric complex 
Gaussian distribution, Wt ^ CJ\f{0, Nq). The channel state 
process forms an irreducible, aperiodic, and stationary Markov 
chain over a finite state space Ht E {Ai, ■ ■ ■ , Aq} with state 
transition probability 

P,.,, = PTiHt = Ag,\Ht-i^ A,) (2) 

and stationary state probability 

P, = Fr{Ht = Ag) 

where q,q' = 1, - ■ ■ ,Q. We call P = [-Pg',g]g',g=i,... ,q the 
state-transition matrix. We use upper-case letters for random 
variables, lower-case ones for their realizations, and boldface 
letters for vectors. We use the notation Pr() for both the 
probability mass function and the probability density function 
(PDF). 

B. Encoding 

In a K-level successive decoding scheme, the transmitter 
partitions the information bits into K sub-sequences according 
to an interleaving pattern, and independently encodes them 
into K codewords of length Nk bits and rate for k = 
1, • • • , K. Let Xfc = {xi^k, ■ ■ ■ , XNk,k} denote the codeword 
k. A total of = X^fcLi -^fe bits from xi,--- ,xk are 
then interleaved into a transmission stream {xt}^^i with an 
overall rate of r = '^rkNk/N. We assume the limit Wk = 
limAT^oo Nk/N, called the weight of the fcth level, exists. 

We assume there is another interleaving (deinterleaving) 
mechanism embedded at the encoder (decoder). Hence the 
interleaver considered here does not scramble the bits within 
an individual codeword and may be represented by a vector 

TT = [tti, • • • , ttn], TTt E {!,■■■ , K} for t = 1, • • • ,N. 

This means a bit from the codeword ttj is transmitted at time 
instant t so that xt — Xi,^^^ for some i. The one-to-one mapping 
between the pair of indices (i, k) and the time index t at 
which Xj.fc is transmitted is conveniently represented by a 
function t = t{i,k) so that Xi^k = Xt[i^k)- For example, if 
the interleaver is tt = [1,3,2,3,2,3], then the transmitted 
sequence [xi,--- , xg] = [a;i,i, a:;i,3, xi,2, a;2,3, 2:2,2, 2:3,3], and 
t(l, 1) = 1, 2) = 3, and t(2, 2) = 5 and so on. 

The interleaver configuration is key to the successive decod- 
ing design since it determines the channel configuration for 
each codeword. Both a deterministic and a random interleaver 
construction will be considered here. 

Definition 1: A K-\evt\ deterministic interleaver is con- 
structed from the repetition of a subpattern u; E of 
a fixed length L so that tt = [u , ■ ■ ■ , uj] E IK^, where 
IK = {1,... 

A simple deterministic interleaver is the rectangular inter- 
leaver with u) = [1, • • • tK]. It is asymptotically optimal [18] 
as -Fr ^ 00 but the free parameter K may be very large for 
good performance. A more general deterministic interleaver, 
called the irregular interleaver [28], is a permutation u; = 



pcrm(li^ , 2i,2 , • • • , Ki,^ ) E K^, where k„ [fc, • • • , A:] is a 
row vector of length n and X]a;=i ^fe ~ ^- Here perm denotes 
a permutation function. The pilot-symbol assisted modulation 
(PSAM) can be viewed as a special case of successive decod- 
ing with only two levels i.e. a; = [1, 2, • • • , 2]. 

Contrary to the rectangular interleaver, the irregular inter- 
leaver has a design space of possible subpatterns us, 
making optimization difficult. This partially motivates a class 
of random interleaver defined as follows. 

Definition 2: A A'-level random interleaver of weight dis- 
tribution w = [u>i,--- ,11] k] is a random vector 11 = 
[Hi, • • • , IItv] whose entries are identically-and-independently 
distributed (i.i.d.) according to the probability mass function 
Pr(nt = k) = Wk for I < t < N wd I < k < K, where 

Wk > and Y.k=i '^k = 1- 

The random interleaver is completely specified by the 
weight distribution. Its optimization can be carried out over a 
K dimensional space and is much more feasible. Its properties 
will be discussed in Section II VI 

C. Estimation and Decoding 

In successive decoding, the K codewords are decoded one 
by one. Their hard-decisions are fed back to subsequent levels 
and are treated as known training symbols. Thus more training 
symbols are produced as decoding proceeds to higher levels. 
Each stage performs either separate estimation and decoding 
(SED) or iterative estimation and decoding (lED). The first 
approach estimates the codeword symbols over the trellis of 
the underlying FSMC, and then invokes the decoder with the 
set of likelihood ratios as its sole input. By separating the 
two processes, it ignores the interaction between the channel 
memory and the code structure and thus has a performance 
penalty. Nonetheless, SED for a deep rectangular interleaver is 
shown in [14], [17], and [18] to be i.u.d. capacity approaching 
because the underlying subchannel tends to be memoryless as 
the number of levels goes to infinity. However, when the design 
has finite levels, the codeword symbols may be closely placed. 
It is then necessary to address jointly the memory of both 
the channel and the encoder. The lED is a computationally 
efficient way to do so and is widely used in the literature, see 
for example [1-4]. 

Consider the decoding process of the codeword x^. Assume 
codewords xi to Xfc_i have been correctly decoded and 
become the set of known training symbols denoted by a 
sequence uj, ~ [ui, • • • ,um] where 

_ ( Xt, if Xt is from codeword Xi to Xfc_i 
[ (/), otherwise. 

Here denotes an erased symbol. The receiver estimates 
the channel state along the trellis of the FSMC using a 
BCJR algorithm similar to [24]. For notational convenience, 
we introduce a windowing operation where ti and 

t2 are the start and end time of the window, respectively. 
Suppose a = [ai, • • • ,aAr] is a vector indexed by time, then 
(a)jj = [at-^j--- ,043]. Similarly, the windowing operation 
on the codeword = [Xi^fc, • • • ,XNk,k] yields (Xfc)jJ = 
[Xi^,k,--- ,Xi2^k], where t{ii - l,k) < ti < t{ii,k) and 
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t{i2,k) < t2 < t{i2 + Let the forward and backward 

state probabilities, respectively, be 

atiq) = PT{Ht - Ag, (y)*f 1, (ufc)*r') (3) 
A(g)=Pr((y>f,(u,)f|i/t = ^,). (4) 

Define the branch metric to reflect whether a known symbol 
is present at time instant t as 

lt{q\q) 

_ ( Friyt.xt, Ht+i = Ag'\Ht = Aq), if xt is known 
\ Pi-{yt,Ht+i = Aq>\Ht = Aq), otherwise 

( jt{q',q,xt), if Xt is known 

"1 E lt{q',q,Xt), otherwise (5) 

I Xt = ±l 

where ^t{q' , q, Xt) is the conditional branch metric given by 

ltiq',q,Xt) 

= FT{yt,Xt,Ht+i = Aq,\Ht = Aq) (6) 
= Pr(Xt) Pr(i/t+i = Aq,\Ht = yl,)Pr(yt|Ht = Aq,Xt) 

(7) 

= Pr(X0P,'.9Cxp ( - |yt - XtAq\^N^^)(nNo)-\ (8) 

In the above, Pr(Xt) = 1/2 for = ±1 is the a 
priori probability and (|7| is from Pr{yt, Xt\Ht+i, Ht) = 
PT{Xt)Pr{yt\Xt, Ht) since Ht evolves independently of Xt 
and yt only conditionally depends on Ht and X^. The at{q) 
and /3t('i') are computed recursively as 

Q 



Pt{q) 



q' = l 

Q 

E 



7t(<z',g)/3t+i(g'), t = A^,---,l 



(10) 



where ai{q) = Pq and PN+i{q) = Pq for 1 < q < Q. 
The estimator computes the likelihood ratio of each bit in the 
codeword k as 



A'iXt = a) 



Pr(Xt 



Fr{Xt = -a,y,Ufe) 
E?=i E?=i at{q)jt{q', q, ~a)(3t+i{q') 



(11) 



for f = k), - ■ ■ , t{Nk, k). The decoder treats the sequence 
of likelihood ratios {A''-{Xi,k ~ +l)}iL\ as i.i.d. samples 
from a memoryless channel and decodes accordingly. 

The lED scheme exchanges the extrinsic soft information 
repetitively between the estimator and the decoder within each 
individual subchannel. Consider the nth iteration at the fcth 
subchannel. Suppose the decoder output likelihood ratio at the 
{n — l)th iteration is Af^_^{Xi k), its extrinsic output is 



(12) 



for I = 1, • • • , iVj., where is the decoder extrinsic 

input. The estimator then computes its output likelihood ratio 
A^{Xi^k) according to (|9]l - ( fTTT i. except that the a priori 
probability Pr{Xt) in ^ is replaced by its extrinsic input 



i^''"(^j,fc) = i„'°i (Xj,fc), where t = t{i, k). The estimator 
output extrinsic information at the nth iteration is 



= AUx.,k)/Kr{x,,k) 



(13) 



for i = 1, • • • , Nk, which becomes the decoder extrinsic 
input L'^'™{Xi^k) ~ ^n°"*(^i,fc)- The above process starts 
with the initial condition Lq '"(Xi ^) = 1 and repeats until a 
stopping criterion is satisfied. For example, the parity-check 
equations hold for an LDPC decoder or the maximum number 
of iterations is reached. 



III. Mutual Information 

This section introduces two mutual information: the achiev- 
able rate R and the i.u.d. capacity C""'^. Both of them 
are derived for the i.u.d. binary channel inputs. While (7'"'' 
denotes the maximal achievable information rate given any 
receiver, R denotes the achievable information rate of the SED 
algorithm. Note the channel capacity C > (7' " requires an 
optimal input distribution. The distance between R and C" " is 
then used to show how fast a finite-level rectangular interleaver 
based scheme converges. 



A. The Achievable Rate and the i.u.d. Capacity 

Analogous to a multiuser system [29], the K independent 
codewords in a successive decoding scheme with perfect de- 
cision feedbaclQ are equivalently transmitted over K parallel 
subchannels (also called equivalent channels [11]). The fcth 
subchannel is defined as a channel with a vector input X^, 
a vector output Y, a training sequence U^, and a channel 
transition probability Pr(Y|Xfc, Ufc). For either the determin- 
istic or the random interleavers, the subchannel is a stationary, 
ergodic, and indecomposable [27] FSMC. Following [27], we 
define the i.u.d. capacity of each subchannel as follows. 

Definition 3: The i.u.d. capacity of the fcth subchannel is 
the mutual information between the i.u.d. input vector Xfc and 
the output vector Y conditioned on the training sequence Ufc 



Cr'- = lim — /(Xfe;Y|Ufe) 



1 



(14) 



N^oo Nk 

and the i.u.d. capacity of the physical channel ([T]) is the mutual 
information between the i.u.d. input vector X and the output 
vector Y 



1 



(15) 



= lim — /(X:Y). 
Applying the chain rule of mutual information to ( fl5l l yields 



u.d. 



K 

E 

k=l 



(16) 



Achieving ( fT4] i with one code would require a joint maximum- 
likelihood decoder On the other hand, the simple SED over a 
subchannel can achieve the following rate. 

' Throughout the paper, the decision feedback is assumed to be perfect. The 
effects of imperfect decisions are minimal in the proposed successive decoding 
because decisions are generated by a strong component code, which, when 
operating at the designed region, will produce a small BER that has little 
effect on the estimation process in the later stage. A more detailed discussion 
on the imperfect decision feedback can be found in [18]. 
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Definition 4: The achievable rate of the fcth subchannel 
with SED is the average of the conditional mutual information 
between the individual bit ^ and the channel output Y 



1 



i?fe= lim ^/(X,fe;Y|U,) 

2—1 



(17) 



and the overall achievable rate of the physical channel O is 



K 



(18) 



k=l 



The achievable rate is the maximal rate of error-free 
communication with the suboptimal decoding rule [30] that 
approximates the true APP Pr(Xi;|Y, Ufc) by a product 
J3Pr(Xi fc|Y, Ufe) under the (memoryless subchannel) as- 
sumption that {Xi.fc}^\ are conditionally independent. It is a 
lower bound of the subchannel i.u.d. capacity since 



^i.u.d. 



Rk 



E 



Pr(Xfc|Y,Ufc) 



log. 



Pr(Xfc|Y,Ufc) 



i?XL(Pr(Xfc|Y,U,) J]Pr(X,,fc|Y,U, 



> 



(19) 



where (O is due to the non-negativity of the Kullback-Leibler 
distance D^l [29]. The information rate (fTTl i was also derived 
in [8] as the performance bound of decoding LDPC codes over 
ISI channels with a single pass of the BCJR algorithm. 

B. Estimation of the Achievable Rate and the i.u.d. Capacity 

From the definition of the mutual information and the 
entropy [29], expression ( [TtI i can be calculated as follows 

1 / 

Rk^ lim — y (i/(X,.fe)-i7(X,,fc|Y,Ufe) 



1 r 
= 1 - lim — e\ 
Ni. ^ L 



N- 



4=1 

log2 Pr(Xj^fc 



lim 



1 ^ 



Xi.ki'y 



- log2 



1=1 



= y, Ufc = Ufc) 

A'^(.T.,fc) 
l+A'=(.T,;,fe) 



(20) 
(21) 



The expectation in (ISTT i can be evaluated through a Monte- 
Carlo integration, which simulates the channel output and 
computes {A'^{xi^k)} using the BCJR algorithm according to 
© - (fTTT i. In fact, the computation of Rk directly corresponds 
to the estimation process in the SED, except that the likelihood 
ratio of the actual input realization Xi^k must be used. 

Applying the chain rule of mutual information to (fT4] | yields 

lim — y /(X,.fe;Y|Ufe,Xi,fc,... ,X,_i,fc). 

N-^OO ^ — ^ 

i—1 

(22) 

According to ( |22] |. we introduce an additional training se- 
quence {a;i,fc, • • • ,Xi-i^k} in the forward recursion of the 



BCJR algorithm to compute a new likelihood ratio of x. 



Pr(a:.t,fc,y, Ufc,a:i,fc, ■ ■ ■ ,a:»-i,fc) 
Pr(-a;j.fe, y, Ufc, xi^fe, • • • , Xi-i^k) 



for I = 1, • • • , Nk- Hence 



ci:"'' =i - lim — Ye 



-l0g2 



A^(x,^k) 



\ + k^{x,^k) 



i.k 



(23) 



(24) 



Note combining ( fT6l ) and {2M yields the i.u.d. capacity of the 
physical channel with memory. This is an alternative to the 
methods presented in [23] and [13] and was hinted, though 
not pursued, in [23]. 

C. Convergence of the Achievable Rate to the i.u.d. Capacity 

Under the mild conditions of positive state-transition matrix 
and noisy channel output, a hidden Markov model has expo- 
nential decay of the channel memory such that the difference 
between the state estimates at time t with or without the initial 
channel knowledge at t~n goes to zero exponentially fast with 
respect to n due to state mixing, see for example [31] and [32]. 
This implies that the increase in mutual information between 
Xt and Y due to some additional training symbols located at 
least n symbols away also goes to zero at an exponential rate 
as shown in Lemma [T] 

Lemma 1: Assume that the channel state-transition matrix 
P defined in ^ is primitive, i.e., Pq',q > and assume that 
Pviyt\Ht = Ag,Xt) > for any 1 < g < g, 1 < g' < g, 
Xt e {-!,+!}, and 1 < t < N. The conditional mutual 
information l{Xt; (Y)jtJ^| (U)*j:^), where U is an arbitrary 
sequence of training symbols, will converge exponentially fast 
with respect to m for any n > so that 



I Xt 



I Xt 



< ln(2) ^ max 

l<i<Q,l<i<C 



(Y)^t;^l(u)*t;:j 

d(p„p,)r(P)™-i 



where m' > to > 0. Here for 1 < i < g is the column 
vectors of P, < r(P) < 1 is the Birkhoff contraction 
coefficient of a strictly positive matrix P defined as 



and 



r(P) 



d(u,v) 



sup 

u>0,v>0,U7^Av 



d{Pu, Pv) 
d(u,v) 



In max 



v{j) 
v{{) u{j) 



> 



(25) 



(26) 



is the Hilbert metric between two positive vectors u and v. 
Proof: See appendix U ■ 

The statement and the proof of Lemma [T| involve the theory 
of the product of positive matrices, see [31], [33], and [34]. 
The properties of both the Hilbert metric and the Birkhoff 
contraction coefficient can be found in [31]. 

Applying Lemma [T] to the rectangular interleaver, the fol- 
lowing theorem shows that, as K oo, Pr(Xfc|Y,Ufc) 
riil^i P^(^i,fc|Y, Ufc) and the subchannel converges to a 
memoryless channel where the SED is indeed optimal. 
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Theorem 1: Under the same assumptions of Lemma 1, 
the achievable rate i? of a /v-level rectangular interleaver 
approaches C " exponentially fast with respect to K, so 

C-'^- -R< ln(2)-i max d(p,,p,)r(P)^-2 (27) 

l<i<Q,l<j<Q 

where P is the state-transition matrix (|2]i, is the column 
vector of P, and < t{P) < 1 and d{pi,Pj) > are the 
Birkhoff contraction coefficient dZST l and the Hilbert metric 
(l26T l. respectively. 

Proof: From ( l22l i and ( fTTj l. the difference between C^" ''' 
and is upper bounded by 

/^i.u.d. 7j 



1 



E 

i=l 

/(X,,fc;Y|Ufc) 



(28) 



i— 1 



(Y)^,_,)_^_,J(U,),^,,,)_K+i)) (29) 



<ln(2)"\^. max d(p„p,)r(P: 

l<i<Q,l<j<Q 



K~2 



(30) 



for k = 1, - ■ ■ ,K, where i29i is true because eliminating 
some channel outputs and training symbols reduces mutual 
information and (l30l l is a result of Lemma [T] The result (l27T i 
follows from (O and the fact that C' " '^- = ^f^^ C'fc"'' /^'' 



and R = ^f^i ii-fc/A- 



and 



K 

fc=i 



(34) 



The following proposition shows that, as the length of a 
random interleaver realization goes to infinity, both the i.u.d. 
capacity and the achievable rate of a subchannel converge to 
their ensemble averages in probability. 

Proposition 1: For a random interleaver 11 with i.i.d. en- 
tries, as — > oo 



/n(Xfe;Y|Ufc)/iV,- 
^/n(X,.fe;Y|Ufe)/7V, 



(35) 
(36) 



Proof: This proof will show ( 1351 1 only. The proof of 
is similar. Partition the iV-bit interleaver 11 = [tti , • • • , ttn] 
into n m-bit long blocks as 11 = [IIi,--- ,n„], where 
N ^ nm and 11^ = [TT(j-i),n+i, • • • , Tr^m] for j = 1, • • • , n. 
The subsequences that lie within the jth block are denoted by 

Y \ v'^' — /V 



jni viJJ _ /V. \J™ and U'--'^ = 



(Ufc)(" i),„+r The length of X^f'^ is denoted by A^^-''^ 

Since the channel is stationary, ergodic, and indecomposible 
[27] and so is the random interleaver, it can be shown (see [27] 
and [18]) that as to ^ oo 



lim /n(Xfc;Y|Ufc)= lim V /n, (X[:''' ; Y*^' |U 



IV. Design for Separate Estimation and Decoding 

In this section, we first show some properties of the random 
interleaver and then address the random interleaver optimiza- 
tion problem using a pilot-utility function. We also give a good 
deterministic interleaver constructed according to empirical 
rules. 



A. Properties of the Random Interleaver 

Under a random interleaver, the configuration of a sub- 
channel as well as its achievable rate and i.u.d. capacity 
depend on the realization of the interleaver We define the 

Rk, respectively, as 



(ensemble) average of C)^ '^- and the (ensemble) average of 



■^i.u.d. 



lim E- 



N- 



n 



lim E- 



n 



i-/n(Xfe;Y|Ufc) 
— 5:/n(X,.;Y|U.) 



(31) 



(32) 



where the subscript 11 is introduced in the mutual information 
to denote its dependency on the underlying interleaver pattern. 
The (ensemble) average of C'' " ''- and the (ensemble) average 
of R are respectively 



i.u.d. 



K 

E 

fc=i 



(33) 



Furthermore, lim,„_>oo nNjf^ /Nk = 1 for any j. Therefore 



lim — /n(Xfc;Y|U,) 

N—>oo I\ 

lim lim y /n (x[^^Y[^^|u[f)) 

k j — 1 

1 



= lim En, 



N, 



(i) 



^n,(X 



(i). v{j)|U(j) 



fc 



-Ni.u.d. 



(37) 
(38) 



where ( l37b is due to the stationarity and ergodicity of the 
random interleaver, and ( |38] | is due to the definition dSTT i and 
the fact that Uj and 11 have the same statistics. ■ 

Therefore, if a specific weight distribution yields an optimal 
ensemble average achievable rate, we can generate a suffi- 
ciently long interleaver realization to achieve the same rate. 

The random interleaver has the asymptotic property similar 
to that of the rectangular interleaver. If Wk 0, symbols of 
codeword k are expected to be scattered far away from each 
other and Rfe will approach C)(," ''' as shown in the following 
lemma. 

Lemma 2: Let Wk be the weight of subchannel k, then 



lim (Cr 



c) -0, 



1, 
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Proof: By the definitions in dSTl ) and d32] i and the chain 
rule of mutual information, we get 



= lim i?" 



n 



■/n(X,;Y|Ufc) 



lim E- 



N- 



n 



i — 1 



-'^i.fc, • • • ,Xi-i^k) — /-[-[(Xi^fc; Y|Ufc) 



(39) 



Let the distance between the consecutive bits Xi^k and Xi^i k 
be 

= fc) -t{i~l,k), z = 2, • • • , TVfc. 

The random sequence {Dn.}^J^2 is i-i-d. with probability mass 
function 



d> 1. 



Let a = ln(2) ^ maxi<i<Q,i<j<Q d(pj, p^). Applying 
Lemma [T] to ( |39] | yields 



Ife < lim Etj 



n 



ar(P) 



^ aii;fe(l-«;fc)^-''-V(P)^-'=-i 



-Di.fc = l 



1 - t{P){1 - w^)' 



(40) 



Since < t{P) < 1, letting Wfc in ( |40] | completes the 
proof. ■ 

This implies that a random interleaver with equal weight 
Wk = 1/K is asymptotically optimal as follows. 

Theorem 2: For a 7^-level random interleaver with weight 
distribution w = [l/K, ■ ■ ■ ,l/K] 

lim R = C' " '^-. 

K^oo 

Proof: Since the i.u.d. capacity is independent of any in- 
terleaving scheme, we have C''" ''- = _ Yli^=i WkC^j^'^'. 
Substitute Wk — 1/ K into ( l40l i and use ( l33T l and (l3?b , we get 



fe=l 



ln(2) ^ max d{Tpi,'Pj)K ^ 



< 



l<i<Q,l<J<C 



1-t(P)(1- A"-i) 
Letting K oo completes the proof. 



B. Optimization of the Random Interleaver 

Note that Mfc(w) depends on w only through X]i=i^ '^i^ 
percentage of the training symbols at subchannel k. It is useful 
to quantify the utility (in terms of mutual information) of the 
randomly positioned pilot symbols. 

Definition 5: The pilot-utility function ii{x) is defined as 
the achievable rate of the data as a function of the pilot 
percentage x 

^(x) = lim IiXu{Y)lt"J{Z)lzl{Z)lX",) (41) 



for X £ [0, 1] where 



Zt 



Xt 



The function ii{x) 



with probability x 
with probability 1 — x. 
[0, 1] — > [0, 1] is assumed to 
be continuously differentiable in (0, 1) and is shown to be 
monotonically increasing < fj,{y) for < x < y < 1 

in Appendix [III A Monte-Carlo method similar to that in 
Section UlI-BI can be used to estimate The pilot-utility 

function can also be viewed as the EXIT function where the a 
priori probabilities are passed through a binary erasure channel 
(EEC) [21]. The following two theorems show that M.k and 
£ii.u.d. gjjjjpjy j-jjg evaluations on p{x). 

Theorem 3: Let p{x) be the pilot-utility function of the 
FSMC. The average achievable rate of level k in successive 
decoding under a random interleaver is 



(42) 



for k = 1, 



, K. The overall average achievable rate is 

K ,k-\ s 

M(w) = ^u.fcAi( ). (43) 



k=\ ^j=i ^ 
Proof: As X ^ oo, almost all terms /-[-[(Xi.fe; Y|Ufe) 

inside the summation of ( l32b will converge to one windowed 



term lim,. 



KU.):!S;!:) due to the 



exponential decay of the FSMC channel memory as shown in 
Lemma [T] also see [18]. So the definition of Mj, in ( |32] | can 
be re-written as 



(44) 



By the construction of the random interleaver, U/j 

[J7i, • • • , C/at], where 



fe-i 



Xt, with probability J2i=i 



k-l 



with probability 1 — J2i=i 



for t 7^ t{i, k), is equivalent to a sequence of random training 
symbols. Therefore, from (HTI) and (l44l l and the stationarity 
and ergodicity of both the channel and the interleaver, we 
have Mfc(w) = KT,j=iWj). Then (gS holds smce M = 

J2k=l^k'^k- ■ 

Theorem 4: Area property. Let /i(a;) be the pilot-utility 
function of the FSMC. The average i.u.d. capacity of level 
k in successive decoding under a random interleaver is 

1 f^U 

— / n{x) dx (45) 

Wk Jt.)Z>. 



■^i.u.d. 
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where Wk 7^ for k 
is 



1, • • • ,K. The overall i.u.d. capacity 



C'"'' = / fi{x)dx. (46) 
Jq 

Proof: Let w = [wi,--- ,wk] be the weight 
distribution of a X-level random interleaving scheme. 
Consider a new interleaving scheme that further di- 
vides the subchannel k of weight Wk into m sub- 
subchannels of weights Wk/m. These sub-subchannels are 
indexed by fci , • • • , km and the new weight distribution 
is w = [wi,- ■ ■ ,Wk-i,Wk/m,- ■ ■ ,Wk/m,Wh+i,- ■ ■ ,wk]- 
Similar to ( fT6b . by the chain rule of mutual information we 
have 



^ m 

i=i 



(47) 



As m ^ 00, Wk/m and C):."'^' Kfei for i = 1, ■ ■ ■ ,m 
from Lemma |2] Therefore, let m ^ 00 in the right hand side 
of ( |47] |. we have 



■Ni.u.d. 
-k 



lim — 

m—>oo m 



(48) 



lim -V/^f yu., + — (49) 

i=\ ^ 7 = 1 ^ 



(50) 



where (gSll is due to Lemma |2] (|49]l is due to (|42li, and (ISOl l 
is due to the continuity of [i{x). Equation ( l46b then follows 
(I33]l. ■ 
The Theorem |3] and |4] are illustrated graphically in Fig. [T| 
and |2l respectively. Fig. [T] shows that Mfe is equal to [i(x) 
evaluated at ""^i ^'I'l ^ ^''^^ under a stair-like 

curve. Fig.|2]shows that CJ:" ''' is the area beneath [i(x) between 



and a; = normalized by Wfc and 



(j\.\xA. j.jjg underneath [i(x) between x = and .t ~ \. 
The rate loss of doing SED at level k is the normalized area 



Sk = /S-i n{x)dx - At(I]Li '^3) between the stair- 
like curve and the pilot-utility function as shown in Fig. \T\ 
The statement ( |46] | was also shown as the area property of an 
EXIT function [21]. 

The optimal weight distribution of a random interleaver is 
the solution to the following maximization problem: 



K 

max R(w) = } WklJ. 

k=l 



fe-1 



(51) 



subject to 



and 



K 



k=l 



< Wfc < 1, fc = 1, 



Before solving ( BTT l. we show the property of an optimal 
solution that it is always advantageous to use all levels 
allowed. 

Proposition 2: If fj.{x) is strictly increasing, then 
]R(w|^,_|_j) > M(vif|^), where and w}^^^ are the 

optimal weight distributions of the K-level and {K + l)-level 
successive decoding schemes, respectively. 

Proof: Let w}^ = [wi , wk] be the optimal point. 
Without loss of generality, we assume Wi > Q for all i because 
a i^-level scheme with a zero-weight level is equivalent to 
a {K — l)-level scheme. Now construct a [K + l)-level 



weight distribution wa'4 



7i, • • • 



by splitting 



■w*j^ into two equally weighted levels so that w'^ ~ Wi for 
i ~ 1 , . . . , /\ — 1 and w'^ = wk / 2 for i = K,K + \. 

From (|42]i, Ri(wif+i) = Ki(w|4.) = m(Z]j=i 'W]) for * 
1, • • • , if. By the strict monotonicity of yu(x), we have 

Kk+i(wa'+i) = /i( E + ^) > IKa-(w|,.). 

i=l 

From (|34] i and ( ISTI i. there exists an optimal point wj^^^j^ so 
that 

M(w|,+i) > M(wK+i) > M(w|^). 
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A direct result of Proposition |2] is that an optimal weight 
distribution must not have zero terms as follows. 

Proposition 3: The local optimum w* satisfies that wjj > 
for A: = 1, • • • , if if ii{x) is strictly increasing. 

The Karush-Kuhn-Tucker (KKT) conditions are necessary 
for the solution of the nonlinear maximization problem (ISTI i 
with both equality and inequality constraints to be optimal, 
see for example [35]. Assume ]R(w) : [0,1]^^ — > [0,1] is 
continually differentiable. It can be verified that the constant 
rank constraint qualification (CRCQ) [35] holds for ( BTT l. then 
the local optimum w* satisfies that 



K 

V]R(w*) + ^ ,.,i-w*) + A(l - ^ <) = 

i=l 



K 



(52) 



(53) 
(54) 



ViW* =0, i = I, 

for some A and i^i > for i = 1, - ■ ■ ,K. We assume that pi{x) 
is strictly increasing in the derivation below. From Proposition 
[2 w*> 0, thus t/j = for j = 1, • • • , K due to (|5l. By gill 
and ( 142b and some straightforward manipulation, the necessary 
conditions ( |52] ) to ( |54] i can be simplified to 



i=l 

i=l 
1 



K-2 



K-l 

E 

i=i 



1=1 



K-2 
i=l 



E 

^«a'M' ( 

2 

E 

'i=i 

^a'M'( E 



X (55) 



1=1 



A (56) 



i=l 

( ^ <) = A (58) 



i=i 

K 



^-"* - 1 (59) 



i=l 



W, 



w^>0, i^l,---,K (60) 

where denotes the first order derivative of fi. 

Define ak = J2i=i '^i for ^" ~ l^''' ,K. Since is 
continuous and strictly increasing, the inverse fJ.~^{y) exists, 
where ii{0) < y < From (|59l) and (|58]l, for any /i(0) < 
A < A'(l)' we find wk as 

aA-=M-^(A) (61) 

Wk ~ max(l — ax, 0) (62) 

and find Wi-i for i = K, - ■ ■ ,2 recursively as 

= max((Ti - (Tj_i, 0). (63) 



The set of local optimal points {w*} is given by A that satisfies 

K 

9{X) = 1 - ^ = 0, m(0) < a < 

i=l 

The global optimal solution is argmax^gj^.} R(w). 

For many channels, inserting more pilot symbols has 
diminishing return in the achievable rate of the data. The pilot- 
utility functions of these channels are concave and the optimal 
strategy is to allocate more weight at higher levels. 

Proposition 4: If ij,{x) : [0, 1] — > [0, 1] is continually 
differentiable in (0, 1), strictly increasing, and concave, the 
optimal weight distribution satisfies w* < W2 < • ■ ■ < w}^. 
Proof: From ( |55] l to ( |58] ), we have 

k-l k k 

^(E^'O +^''^+l^'(E^O =^(E"^0 ^^"^^ 
i—1 i—1 i—1 

for k = 1, - ■ ■ ,K — 1. From the concavity of /i(a;), we have 

k k-l k 

^A^(E<)+</^'(E<) (65) 

i—1 i—1 i—1 

for k = 1, • • • ,K — 1. The combination of (|64l i. ( |65] l, and the 
fact that fJ.'{x) > shows that 

Wfe+i >wl, fc = 1, • • • ,if- 1. 



C. A Construction of the Deterministic Interleaver 

Unlike random interleavers, the optimization of determin- 
istic interleavers has combinatorial complexity. Although the 
optimal placement of pilot symbols for PSAM has been 
studied in the literature [36], the problem here is more difficult 
as there are K codewords to be placed. Thus, we present a 
family of deterministic interleavers that are constructed from 
empirical rules proposed for the Markov fading channel. 

First, more weight shall be allocated to higher levels because 
of Proposition m Second, the weight of level 1 (pilot percent- 
age) shall be optimized as it has zero achievable rate and the 
most mutual information loss. Optimization with respect to wi 
often yields the most gain. Third, it is desirable to separate 
the symbols within a codeword and to place the symbols from 
lower levels evenly around them. Accordingly, we construct a 
family of binary-weighted interleavers with Wk+i = 2wk for 

> 2 as 

TV = [u:,---,Lj], u = [l,VK,--- ,vk] e K^. (66) 
Here the vector vk is defined recursively as 

vk = [K,VK-i{l),K,VK-i{'2), ■■■ , 



K, VK-ii2 



K-2 



for K > 2 and V2 = [2]. For example, a 3-level binary- 
weight interleaver has D3 = [3, 2, 3], a 4-level one has 
V4 = [4,3,4,2,4,3,4], and a 5-level one has = 
[5, 4, 5, 3, 5, 4, 5, 2, 5, 4, 5, 3, 5, 4, 5]. It is clear that the training 
symbols are well placed for each level. Furthermore, the 
weight of level 1 can be optimized by finding the optimal 
number of vk in 
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V. Design for Iterative Estimation and Decoding 

For the iterative receivers, the design techniques proposed 
in Section |IV] including interleaver optimization and code rate 
allocation are no longer optimal because the achievable rate 
becomes too conservative a performance measure. On the other 
hand, the EXIT chart based analysis [26] was shown to predict 
the convergence behavior of an iterative process very well. 
Hence, given a family of codes of various rates we use the 
EXIT chart to find the maximal code rate supported by an lED 
algorithm at each subchannel and to formulate the interleaver 
optimization problem. 

A. EXIT Function of the Estimator 

Let and {i'=^°"*(a;j,fc)} in ^ be the se- 

quence of likelihood ratios (extrinsic information) at the input 
and the output of the estimator, respectively. They are assumed 
to be the realizations of i.i.d. random variables. The input and 
output mutual information for the estimator can be obtained 
from 



It 



lim 1 



1 



log2 



(67) 



and 

-'fc 



-log2 



1 + L'=^™*(a;,,fc) 



(68) 

For a given channel and interleaver, the estimator EXIT 
function at level k is 



Je.out rri / Te,in\ 
k =J-k{h )■ 



(69) 

In order to estimate ( |69] l. a sequence {^'^''"(xi^fc)}, the in- 
formation content of which is measured according to ( |67] i. is 
generated according to a given PDF with a single parameter 
and fed to the estimator The estimator output {L'^'°'^^{xi^k)} 
is then collected to produce an estimate of the output mutual 
information using (|68T l. The entire curve of /|;'°"* = Tk{ll'"^) 
can be traced by varying the single parameter of the PDF so 
that Z^ *" changes from to 1. 

The exact PDF of L'^'™{xi,k) is difficult to obtain. One 
commonly adopted approach is to assume that ^'"■''"(xi^fc) is 
derived from an AWGN channel Y ~ X + W with noise 
variance E[W^] = al so that L'='™(2;,.fc) - J\f{2/al,4/al,). 
This approach will be used for deterministic interleavers. 
However, for random interleavers, ( |69] l can be computed more 
efficiently using the pilot-utility function i4T[ . Let 
be drawn according to the following distribution 



+00, with probability x 



1, with probability 1 — x 

which means that a symbol is completely known with proba- 
bility X. Thus, the input mutual information is 



^ u 



and, by the definition of ^{x) in flTb . the output mutual 
information is 



Tk{x) = 11 (xwk + 51 ' 2; e [0, 1] (70) 
^ i=i ' 

where w = [wi, • • • , wk\ is the weight distribution. 

B. EXIT Function of the Decoder 

Let C{r) be a code of rate r. Let the EXIT function of its 
decoder be 

Since the FSMC considered here is a fading channel with 
good channel estimation at the receiver. It is convenient to 
assume that the decoder input extrinsic information L^'™(a;i_fc) 
is derived from a known-state fading channel Y = HX + W, 
where H ~ CAA(0, 1) and W ~ C7V(0, al). Then /f is a 
function of the AWGN variance only and is equal to the i.u.d. 
binary-input capacity of a known-state fading channel given 
in [18] 

rf,,„ A2F(Ai + l,l;Ai+2;~l) Aii^(A2, 1; A2 + 1; -1) 



(Ai+A2)(Ai + l)ln2 



A2(Ai + A2)ln2 



(71) 

where A1.2 = i ( -^Z 1 + cr^ =F 1) and F{a,b; c; z) is a hyper- 
geometric function, or a Gauss hypergeometric function. The 
output mutual information 
output of the decoder. 



d.out 



can be measured at the soft 



C. Design Using the EXIT Charts 

The EXIT chart is a diagram where the estimator EXIT 
function /^;'°"* = Tk{I'^'"^) and the inverse of the decoder 
EXIT function = T^^/'^^""*, C(r)) are plotted together 
The iterative process can be tracked on the EXIT chart as 
a flow of mutual information with initial value = 0. As 
long eisTkix) > T^^{x,C{r)) fovO <x < 1, the iteration will 
proceed to 7'^'°"* = l. Hence, the maximal code rate supported 
by the iterative estimation and decoding at a subchannel can 
be estimated by 

r* = sup{r : Tfc(x) - T-\x,C{r)) > d*, < x < 1} (72) 

r 

where dt > is a design parameter that specifies the allowed 
minimal tunnel width between two EXIT curves. The code 
rate at level k is then chosen to be rl. 

Therefore, we can maximize the overall code rate by match- 
ing the estimator EXIT function at each level to the code EXIT 
function through interleaver design. For random interleavers, 
it is the following weight distribution optimization problem: 

K 

(73) 



max 



subject to 



and 



"^Wkrl 

k=l 

K 



k=l 



< lOfc < 1, k = l,--- ,K 
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where is given in (|72]l. Note that it is impHcitly assumed 
in ( |73] l that the code EXIT function does not depend on Wk 
or equivalently Nk- The justification is that in the case of 
Nk oo, the dependency of the code EXIT function on Nk 
becomes rather weak. 

VI. Performance Analysis under the 
Finite-Length Constraint 

If the overall delay of a successive decoding scheme is 
finite, there is a tradeoff between the number of levels and the 
codeword length of each level. Clearly, for an infinite N it is 
always beneficial to increase K, while for a very small N, 
the best strategy is to use no more than one code with some 
training symbols, as observed in [37]. This section provides an 
analysis of the finite-length effect on the SED based schemes 
by relating the word-error probability to the codeword length 
using the random-coding bound [27]. 

Let X and Y, respectively, be the input and output of a 
memoryless channel and Pr(y|X) be the channel transition 
PDF. The results in [27] state that the error probability of 
maximum-likelihood decoding of a length- block code of 
rate r is upper bounded by 



P. 



-NE''{r) 



where 



E''(r) = max (E"(p)^ pr) 

0<p<l 



(74) 



(75) 



is the random-coding error exponent and 

i?"(p)=-log2 / ( ^ Pr(:r)Pr(y|x)T^)'^V (76) 

•'Y x=±l 

Consider a successive decoding scheme using SED and a 
fixed interleaver tt. Under the SED rule, the fcth subchannel 
is treated as a memoryless channel and the probabilities 
{Pr(Y|Xi.A; = a,Ufe)}^\ are assumed to be independent. 
According to ( |75l ) and ( |76] l. the random-coding error exponent 
at the subchannel k for fc = 1 , • • • , /v is 



0<p<l 



(77) 



where 



1 r 



1=1 



a=±l 

a,Ufc)— ) dY 



1 f 

^ Pr(X,,fc = a|Y,Ufc)^ 



2~PEyr 



a=±l 



1+P 



(78) 



For random interleavers, we take expectation over 11 to obtain 

El{r)^max{E°{p)-pr) (79) 

0<p<l 



0.7 



•0.6 
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Fig. 3. The pilot-utility function for the example FSMC at Es/Nq = 3 dB. 



where 



El{p) = En[El{p,^) 



(80) 



Let the word-error probability of all levels be upper bounded 
by Pg. Assume the decoder at each level produces independent 
errors, the overall error probability of a iiT-level system is 
upper bounded by Pe = 1 - (1 - P^)^ < KP'^. Therefore, 
for a given total length N, a specific interleaver, and a target 
word-error probability upper bound Pg, using (|74] | we can find 
the upper bound of rate at level k by solving 



1 



l0g2 



K 



(81) 



for rfe. Here EKr) can be evaluated numerically from dTTb and 
( l78b for the deterministic interleaver and from ( |79] | and dSOl ) 
for the random interleaver The optimal number of levels under 
the overall-delay constraints can be found by maximizing the 
upper bound of the overall rate = X^aLi '^fc^fc- 



VII. Numerical Results 
A. Example Channel 

A first-order Gauss-Markov process ht 



+ 



\J 0?- — Izt is used as the underlying physical channel to derive 
the finite-state Markov process, where zt ^ CJ\f{0, 1) is the 
driving white Gaussian process, a E (0, 1) determines the 
fading speed, and ht ^ CM{Q, 1) is the continuous-valued 
complex channel gain. The channel state space {Ai, - ■ ■ , Aq} 
is obtained by independently quantizing the real and imaginary 
part of ht ^ CAf{0, 1) using the Max-Lloyd algorithm. 
The state-transition probability is found by integrating the 
joint PDF of ht and hf+i, and the stationary probability by 
integrating the PDF of ht - This paper uses the example channel 
given by a = 0.95 and Q = 36. The quantization points 
and boundaries are respectively {±1.339, ±0.707, ±0.225} 
and {±6, ±1.023, ±0.466, 0} for both dimensions. The FSMC 
considered here models rather accurately a flat-fading channel 
with both random phase rotation and magnitude fluctuation. 
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Fig. 4. The acliievable rates for various interleavers under the separate 
estimation and decoding. 



B. Component Codes 

The component codes for each level are drawn from a 
set of irregular LDPC codes optimized for AWGN channels 
with rates from 0.1 to 0.7 with a step of 0.01. Their degree 
polynomials are generated by LDPCopt [25]. The decoder 
uses the message-passing algorithm with 20 iterations. The 
codeword length is chosen to be proportional to the weight 
of each level and is sufficiently long. For SED, the code 
rate is chosen to be the achievable rate i.e. rk = Rk for 
the deterministic interleaver and = Mfe for the random 
interleaver For lED, the code rates are chosen according to 
([72] |. rk ~ r'l, where the tunnel width dt = 0. These rates are 
then rounded to the nearest available code rates. Note, in some 
cases, the first few levels have a small codeword length. We 
will lower the code rates appropriately, usually 0.01 to 0.03, 
to compensate for it. 
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Fig. 5. Error performance comparison of various coding schemes. 

TABLE I 

Code rates and lengths for random interleavers in the SED 

SCHEME. 



Level Overall rate 



Code rate and length at individual levels 



K=2 


0.5590 




35K 


0.65 
215K 








K=3 


0.5529 




21K 


0.48 
66K 


0.63 
213K 






K=4 


0.5556 




20K 


0.37 
50K 


0.56 
130K 


0.62 
300K 




K=5 


0.5559 




32K 


0.28 
63K 


0.5 
130K 


0.59 
270K 


0.63 
505K 









0.16 


0.31 


0.40 


0.49 






21K 


29K 


40.5K 


58.5K 


85.5K 


K=10 


0.5573 


0.54 


0.56 


0.59 


0.61 


0.62 






123K 


181.5K 


244.5K 


337.5K 


379.5K 



C. Design Results for Separate Estimation and Decoding 

This section presents design examples of successive decod- 
ing with SED. Both the achievable rates and the BERs of 
actual coding implementation show that the proposed random 
and binary-weighted interleaver have significant performance 
gain over the traditional rectangular interleavers and the 
PSAM. 

The pilot-utility function fi{x) is estimated for the above 
channel at Es/Nq = 3 dB as plotted in Fig. [3] Base on fJ,{x), 
we obtain the weight distributions of random interleavers for 
K = 2, • • • , 32 by solving equations (|52] | to (l54l i. We also 
find that the optimal repetition of vk for the binary-weighted 
interleaver (|66] | is 9, 5, 3, and 2, respectively, for K ^ 2,3,4,5 
and is 1 for K > 6. Their achievable rates are plotted in Fig. 
|4] For comparison. Fig. |4] also shows the i.u.d. capacity, the 
achievable rates of a A'-level rectangular interleaver and the 
PSAM. The PSAM is configured to have 1 pilot symbol for 
every K — 1 data symbols and in this case the x-axis of Fig. 
|4]is the ratio of the total number of symbols to the number of 
pilot symbols. 



As shown in Fig.|4] the fundamental problem of the PSAM 
is that the pilot symbols useful for state estimation reduce the 
overall rate. The successive decoding resolves this problem. 
The achievable rates of all types of the interleavers considered 
here are shown to approach C'' "''- exponentially fast as K 
increases. At small K the optimized random interleaver and 
the binary-weighted interleaver have significant performance 
gain over the rectangular one. For comparison, at K ^ 3, the 
rectangular, the random, and the binary-weighted interleaver 
achieve, respectively, 79.8%, 88.9%, and 93.5% of the i.u.d. 
capacity. In order to achieve 95% of the i.u.d. capacity, they 
would require 11, 6, and 4 levels, respectively. This illustrates 
the effectiveness of the proposed design for finite K. It shall be 
noted that although the binary-weighted interleaver is shown 
to outperform the optimized random interleaver in Fig. |4] this 
result may vary for a different channel because the random 
interleaver has more degree of freedom for optimization. 

For a fair comparison in the code simulation, the target over- 
all rate of all schemes is set to 0.56. The weight-distributions 
for the random interleavers are optimized for Es/Nq ~ 3 



13 



TABLE II 

Code rates and lengths for binary- weighted interleavers in 
the sed scheme. 



TABLE III 

Code rates and lengths for different interleavers in the IED 

SCHEMES. 



Interleavers Overall rate Code rate and length at individual levels 



random 


0.5108 





0.22 


0.41 


0.51 


0.57 


lOK 


20K 


55K 


135K 


280K 


bin-weight 


0.5200 





0.37 


0.49 


0.54 


0.57 






20K 


40K 


80K 


160K 


320K 


rectangular 


0.4440 





0.54 


0.55 


0.56 


0.57 




200K 


200K 


200K 


200K 


200K 









0.52 








PSAM 


0.4680 










20K 


180K 









dB. The code rates and lengths are shown in Table U and 
for the random interleaver and the binary-weighted interleaver, 
respectively, in the ascending order of fc = 1 , • • • , A' from left 
to right in each row. Note that both the code rate and length 
increase with the level fc. The deep rectangular interleaver 
serves as a benchmark and is designed according to [18] with 
sufficiently many levels and each level uses the same code of 
rate 0.56 and length 200K. An optimized PSAM, according 
to Fig. m with 1 pilot for every 9 data symbols and a code of 
rate 0.62 is also considered. 

Their BER performance is shown in Fig. |5] Both the 5- 
level binary- weighted interleaver and the 10-level random in- 
terleaver perform very close (within 0.3 dB) to the asymptotic 
deep rectangular interleaver and are around 1.3 dB to the i.u.d. 
capacity. They have a gain of round 2 dB over PSAM. Note, 
a 2-level random interleaver performs worse than the PSAM. 
These results match the achievable rate results in Fig. H] very 
well, confirming the good performance of both the random 
interleaver and the binary-weighted interleaver. 

D. Design Results for Iterative Estimation and Decoding 

In the following, we design the IED schemes for various 
interleavers to maximize the communication rate at a target 
Es/Nq of 3 dB. To obtain the weight distribution of the 
random interleavers, we first estimate the EXIT functions 
jd,out ^ Td{I'^-'"\C{r)) of the given set of irregular LDPC 
codes with rates r 0.1, 0.11, • • • ,0.7 and length lOOK. The 
estimator EXIT functions Tk for k ~ 1 , • • • , AT are given 
by ( ITOI i. We then numerically solve dTSl l. where the tunnel 
width in ( |72] | is set to dt = 0, using the Matlab fmincon 
function for 200 random initial values. For other schemes, the 
estimator EXIT charts are computed based on the Gaussian 
approximated log-likelihood ratio. The resulting overall code 
rates are plotted in Fig.|6l It shows that the successive decoding 
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Fig. 6. The overall LDPC code rates supported by the iterative estimation 
and decoding receiver at Es/Nq = 3 dB. 

with a random interleaver greatly outperforms the rectangular 
interleaver and PSAM. Compared with SED in Fig. H] the 
rate loss of using a small K is much smaller here. In fact, a 
random interleaver with only 6 levels has a rate very close to 
its asymptotic case. 

The codes for various 5 -level successive decoding schemes 
and the optimized PSAM are specified in Table |III] The set 
of EXIT charts used for code rate selection and optimization 
is shown in Fig.|2]to[T0] For the binary-weighted interleaver, 
the EXIT functions of the estimator and the decoder match 
very well, predicting its best performance among all schemes. 
For the random interleaver, the EXIT chart matches better as 
the level gets higher, especially at the highest two levels. This 
explains the good performance for random interleavers. On the 
other hand, the rectangular interleaver has rather flat estimator 
EXIT functions and thus little IED gain. Fig. [TT] to [14] show 
the coding results of the above designs in reference to the 
SNR at which the i.u.d. capacity is equal to the overall rate. 
AH schemes have BER of 10^^ at around 3.3 dB. Both the 
random and the binary-weighted interleavers have around 0.6 
dB gain from IED, and are, respectively, around 1 . 1 dB and 1 
dB to the i.u.d. capacity. The PSAM also has around 0.6 dB 
gain for iterative receivers, however, it is around 2 dB to the 
i.u.d. capacity due to the rate loss of 10% pilot symbols. The 
performance of a 5-level rectangular interleaver is 2.5 dB to 
the i.u.d. capacity. 

E. Results for Finite-Length Analysis 

Here we show the performance bound of successive de- 
coding with a delay constraint using the random-coding error 
exponent analysis in Section [Vl] Both the rectangular and 
the random interleaver are considered. We compute the error 
exponents for the rectangular and the random interleaver 
according to (ITTI i and (|79] |, respectively, using the Monte-Carlo 
simulation. The overall rate r = X^aLi ^fc^fc' where r^, is 
computed from dSTl i with = 10^^, is shown in Fig. [TS] and 
[T6]for the rectangular and the random interleaver, respectively. 



Level Overall rate Code rate and length at individual levels 
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Fig. 7. EXIT charts of the optimized random interleaver. 



Fig. 9. EXIT charts of the rectangular interleaver 
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EXIT charts of the binary-weighted interleaver. 
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Fig. 10. EXIT charts of PSAM. 



In the extreme case of = 100, it is clearly the best to use 
only one code and some pilot symbols. Otherwise, it is usually 
beneficial to use more than two levels and there is an optimal 
number of levels for short to moderate block lengths for both 
types of interleavers. The random interleaver is shown to have 
a higher overall rate than the rectangular interleaver especially 
for small N . 

VIII. Concluding Remarks 

In this paper, we have proposed and analyzed new designs 
of successive decoding scheme with finite levels. The main 
techniques are a flexible interleaver structure and iterative 
estimation and decoding within each level. Both the random 
and the binary-weighted interleaver are constructed to have 
near i.u.d. capacity performance with as few as five levels. 
Using irregular LDPC codes, an optimized 10-level random 
interleaver using SED performs very close to the deep rectan- 
gular interleaver, and a 5-level random interleaver using lED 



is less than 1.1 dB away from the i.u.d. capacity. These results 
show that successive decoding is not only asymptotically 
optimal but also attractive for practical systems. The proposed 
random interleaver also provides some interesting insight into 
the channel mutual information. We have also analyzed the 
performance of successive decoding under an overall-delay 
constraint based on the random-coding error exponent. We 
showed that using multiple levels is useful for a moderate 
delay constraint and an optimal number of levels can be found. 

The interleaved K LDPC codes used here can be viewed 
as a compound LDPC code and the successive decoding as 
a special message-passing schedule. Thus, we have in effect 
obtained a procedure to construct an irregular LDPC code that 
can approach the i.u.d. capacity of a realistic channel with 
memory from a set of AWGN optimized degree profiles. This 
may suggest a new approach to designing good degree profiles 
for LDPC codes over channels with memory using the idea of 
successive decoding. 
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Fig. 11. BER of weight-optimized random interleaver. 



Fig. 13. BER of rectangular interleaver. 
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Fig. 12. BER of binary- weighted interleaver 



One possible extension of the current scheme, especially for 
systems with a stringent delay constraint, is to allow the K 
decoders to exchange soft information similar to [15] and [16]. 
Although it is suggested in [15] and [16] that the component 
codes may be designed based on the original system with 
perfect decision feedback, more performance gain can be 
expected if we can find new code-optimization techniques 
that take into account the iteration between different levels. 
In practice, the joint construction of the short code and the 
interleaver under a small delay constraint may further improve 
the performance. 
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Fig. 14. BER of PSAM. 



of the mutual information 



I Xt 



1 



In 2 



-E 



In 



1 + A{xt 



\t+n \ 
It-m 



l + A'(.Tt) K{xt) 



(82) 



Appendix I 
Proof of Lemma[I] 

Let K{xt) and K'{xt), respectively, be the likelihood ratio Let at{q) and aj((j) be the forward state probabilities com- 
of Xt computed using forward recursion windows of m and puted from windows of m and m', respectively. Let Ptiq) be 
m' and a backward recursion window of n. By the definition the backward state probability computed from a window of n. 
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Fig. 15. Performance of rectangular interleavers with finite-length constraint. Fig. 16. Performance of random interleavers with finite-length constraint. 



By (fTTT i. we have 



A'{xt) 1 + A(a;t) g=ig'=i 



1 + A'(a;t) A{xt) Q Q 

q=lq' = l 

E E at{q)^tiq',q)Pt+l{q') 

q=lq' = l 



E E a',{q)^tiq',q)Pt+liq' 

q=lq' = l 

Define a diagonal matrix 

A = diag{A(l),-- - ,A(Q)} 



(83) 



where 



Dtiq) 



E 

aG{-l, + l} 



Fv{yt\ht = Ag,Xt =a)Pr{Xt = a), 
if xt is unknown 



PT{yt\ht = Aq,xt) Pr(xt), if Xt is known. 



Define at = [at(l), • • • , at((3)]^ and a'^ = 
[a[{l), • • • , at(Q)]"^. We can write the recursion formula ^ 
and (|9]) in matrix form as 

Oit = PDt-lPDt-2 ■ ■ ■ PDt-mOLt-m 

a[ = PDt^iPDt-2 ■ ■ ■ PDt^,„a.'t.m- 



In the above. 



and a'f 



Ol-t-m P 111U u;j_„j 

PDt-m-1 ■ ■ ■ PP't-m'P, where P is the state-transition 
matrix and p = [Pi,-- - ,Pq]'^ is the vector of stationary 
state probability. Since it is assumed that P > 0, Dt > 0, 
and TT > 0, the forward state probability vectors are strictly 
positive at > and a[ > for any t. Therefore, applying 
the following inequality 

i < max where u(i) > 0, v(i) > (84) 



to ([83]) yields 



A'ixt) 1 + A{xt) a'tiq) at{q) 

< max — — max , , , . (85) 



1 + A'{xt) A(xt) i<q<Q atiq) i<q<Q a'tiq)' 



Take logarithm on both sides of ( I85b and apply the definition 
of the Hilbert metric, we have 



ln(.^^i±Ai^)<4a.aO. (86) 



1 + A'{xt) A{xt) 



From the property of nonnegative matrices products [31] that 
the multiplication of a positive matrix M > and positive 
vectors u > and v > is a strict contraction with respect 
to the Hilbert metric, (i(A/u, 7\fv) < r(A/)(i(u, v), the right 
hand side of ( |86] ) is upper bounded as 



d{at,a't) < T {PDt^iPDt^2 ■ ■ ■ PDt^m+i) 

X d{PDt-mOLt-m,PDt-ma[_^) 
< T (PA-l) T (PA-2) • • • T (PDt-m+l) 

X d(PA-mat-^,PA^ma;_^) (87) 



T(P)"-id(PA-mat_^, PA-ma;_^) 



(88) 



where ( [87] i is due to the property of the Birkhoff contraction 
coefficient that t{MiM2) < t{Mi)t{M2) for Mi > and 
Ma > 0, and ^ follows t{MiD) = t(A/i) for a diagonal 
matrix with positive diagonal entries. 

In the following, we derive the upper bound for the 

term d(PA-mat-7Ti,, PA-jnCtt-m) ™ ^et a = 

[a(l),- - - ,a(Q)]^ and a' = [a'(l), - - - , a'(Q)]^ be some 



17 



positive vectors. By the definition ( [26] l, we have 



In 



/ Q / 

1=1 

Qax — -, — 

9 a'{j)P{q.3) 



Y.a{i)P{k,i)\ \ 

1=1 



< 



max 



a{i)P{q,i) 



k a'{j)P{k,j) 

a{i)P{k,i) 



E 



< In max max -— mm -— - — r 

= max(i(pj, pj) 





(91) 



where ( |89] l is derived by moving the max and min operator 
inside the summation, (|90] l follows the inequality (|84] | and (|9T| i 
follows the definition (|26] |. Now, apply ( |9T| l twice, we have 



d(^^a{i)p^, ^a'{j)pjj < maxd(^ ^ a(i)p,;, Pj 



j=i 



< maxd(pi, Pj). 



Hence 



d{PDt PDt-„ia'i._^) <maxd{pi, Pj) . (92) 



Combining 



, and (|92] i completes the proof. 



Appendix II 
Proof of the monotonicity of pilot-utility 
function 

For convenience, we re-write the definition of pilot-utility 
function here 

^^ix) = lim In{Xu {Yy+^mtl i^Ytli ) (93) 

n — >oc 

where 

^ _ ( Xt, with probability x 
* [ (f), with probability 1 — a; 

is the random training symbol. Let < x < y < 1. We define 
an additional sequence of training symbols Z, where 

/>, if Zt = Xt 
Zt — { Xt, with probability y — x if Zt = (j) 

4>, with probability 1 — [y — x) \f Zt ~ (j) 

and Z' = Z y Z. It can be shown that Z't is i.i.d. with 
probability distribution Pr{Z't = Xt) = y and Pr(Zj ~ cj)) = 
1 — y. Therefore 

fx{y) = lim /n(X,; {YYttimlzl (Z')^r)- (94) 

n — 'oo 

From (l93T l and (|94] | and the chain rule and the non-negativity 
of the mutual information, we have 

H{y) ~ n{x) 

= lim /^(X,;(Z)*:l,(Z)*+5'|(Y)*^'„^(Z)*I,\,(Z)^l') 
> 0. (95) 
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